[Flang] Adjust the trampoline size for AArch64 and PPC #118678

ssijaric-nv · 2024-12-04T18:14:45Z

The trampoline size is 36 bytes for AArch64, 40 bytes for PPC32 and 48 bytes
for PPC64, defined in compiler-rt/lib/builtins/trampoline_setup.c. During AArch64
and PPC lowering, init.trampoline is lowered to a call to __trampoline_setup, with the
corresponding trampoline sizes passed to it. On x86_64 and RISCV, __trampoline_setup
isn't used, and the setup fits within 32 bytes.

github-actions · 2024-12-04T18:18:22Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvmbot · 2024-12-05T00:51:16Z

@llvm/pr-subscribers-flang-codegen

Author: None (ssijaric-nv)

Changes

The trampoline size is 36 bytes for AArch64, 40 bytes for PPC32 and 48 bytes
for PPC64. During AArch64 and PPC lowering, init.trampoline is lowered
to a call to __trampoline_setup, with the corresponding trampoline sizes
passed to it.

Full diff: https://github.com/llvm/llvm-project/pull/118678.diff

2 Files Affected:

(modified) flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp (+9-1)
(modified) flang/test/Fir/boxproc.fir (+9-3)

diff --git a/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp b/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp
index c536fd19fcc69a..e1c9e97d0b535b 100644
--- a/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp
+++ b/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp
@@ -268,10 +268,18 @@ class BoxedProcedurePass
             // Create the thunk.
             auto module = embox->getParentOfType<mlir::ModuleOp>();
             FirOpBuilder builder(rewriter, module);
+            const auto triple{fir::getTargetTriple(module)};
             auto loc = embox.getLoc();
             mlir::Type i8Ty = builder.getI8Type();
             mlir::Type i8Ptr = builder.getRefType(i8Ty);
-            mlir::Type buffTy = SequenceType::get({32}, i8Ty);
+            fir::SequenceType::Extent thunkSize = 32;
+            if (triple.isPPC32())
+              thunkSize = 40;
+            else if (triple.isPPC64())
+              thunkSize = 48;
+            else if (triple.isAArch64())
+              thunkSize = 36;
+            mlir::Type buffTy = SequenceType::get({thunkSize}, i8Ty);
             auto buffer = builder.create<AllocaOp>(loc, buffTy);
             mlir::Value closure =
                 builder.createConvert(loc, i8Ptr, embox.getHost());
diff --git a/flang/test/Fir/boxproc.fir b/flang/test/Fir/boxproc.fir
index 9e4ea0bc210775..fbc1329892cbc7 100644
--- a/flang/test/Fir/boxproc.fir
+++ b/flang/test/Fir/boxproc.fir
@@ -1,7 +1,11 @@
-// RUN: tco %s | FileCheck %s
+// RUN: %if aarch64-registered-target %{tco --target=aarch64-unknown-linux-gnu %s | FileCheck %s --check-prefixes=CHECK,CHECK-AARCH64 %}
+// RUN: %if x86-registered-target %{tco --target=x86_64-unknown-linux-gnu %s | FileCheck %s --check-prefixes=CHECK,CHECK-X86 %}
+// RUN: %if powerpc-registered-target %{tco --target=powerpc64le-unknown-linux-gnu %s | FileCheck %s --check-prefixes=CHECK,CHECK-PPC %}
 
 // CHECK-LABEL: define void @_QPtest_proc_dummy()
-// CHECK:         %[[VAL_3:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-AARCH64: %[[VAL_3:.*]] = alloca [36 x i8], i64 1, align 1
+// CHECK-X86:     %[[VAL_3:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-PPC:     %[[VAL_3:.*]] = alloca [4{{[0-8]+}} x i8], i64 1, align 1
 // CHECK:         %[[VAL_1:.*]] = alloca { ptr }, i64 1, align 8
 // CHECK:         %[[VAL_0:.*]] = alloca i32, i64 1, align 4
 // CHECK:         %[[VAL_2:.*]] = getelementptr { ptr }, ptr %[[VAL_1]], i32 0, i32 0
@@ -59,7 +63,9 @@ func.func @_QPtest_proc_dummy_other(%arg0: !fir.boxproc<() -> ()>) {
 }
 
 // CHECK-LABEL: define void @_QPtest_proc_dummy_char()
-// CHECK:         %[[VAL_20:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-AARCH64: %[[VAL_20:.*]] = alloca [36 x i8], i64 1, align 1
+// CHECK-X86:     %[[VAL_20:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-PPC:     %[[VAL_20:.*]] = alloca [4{{[0-8]+}} x i8], i64 1, align 1
 // CHECK:         %[[VAL_2:.*]] = alloca { { ptr, i64 } }, i64 1, align 8
 // CHECK:         %[[VAL_1:.*]] = alloca [10 x i8], i64 1, align 1
 // CHECK:         %[[VAL_0:.*]] = alloca [40 x i8], i64 1, align 1

llvmbot · 2024-12-05T00:51:17Z

@llvm/pr-subscribers-flang-fir-hlfir

Author: None (ssijaric-nv)

Changes

The trampoline size is 36 bytes for AArch64, 40 bytes for PPC32 and 48 bytes
for PPC64. During AArch64 and PPC lowering, init.trampoline is lowered
to a call to __trampoline_setup, with the corresponding trampoline sizes
passed to it.

Full diff: https://github.com/llvm/llvm-project/pull/118678.diff

2 Files Affected:

(modified) flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp (+9-1)
(modified) flang/test/Fir/boxproc.fir (+9-3)

diff --git a/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp b/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp
index c536fd19fcc69a..e1c9e97d0b535b 100644
--- a/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp
+++ b/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp
@@ -268,10 +268,18 @@ class BoxedProcedurePass
             // Create the thunk.
             auto module = embox->getParentOfType<mlir::ModuleOp>();
             FirOpBuilder builder(rewriter, module);
+            const auto triple{fir::getTargetTriple(module)};
             auto loc = embox.getLoc();
             mlir::Type i8Ty = builder.getI8Type();
             mlir::Type i8Ptr = builder.getRefType(i8Ty);
-            mlir::Type buffTy = SequenceType::get({32}, i8Ty);
+            fir::SequenceType::Extent thunkSize = 32;
+            if (triple.isPPC32())
+              thunkSize = 40;
+            else if (triple.isPPC64())
+              thunkSize = 48;
+            else if (triple.isAArch64())
+              thunkSize = 36;
+            mlir::Type buffTy = SequenceType::get({thunkSize}, i8Ty);
             auto buffer = builder.create<AllocaOp>(loc, buffTy);
             mlir::Value closure =
                 builder.createConvert(loc, i8Ptr, embox.getHost());
diff --git a/flang/test/Fir/boxproc.fir b/flang/test/Fir/boxproc.fir
index 9e4ea0bc210775..fbc1329892cbc7 100644
--- a/flang/test/Fir/boxproc.fir
+++ b/flang/test/Fir/boxproc.fir
@@ -1,7 +1,11 @@
-// RUN: tco %s | FileCheck %s
+// RUN: %if aarch64-registered-target %{tco --target=aarch64-unknown-linux-gnu %s | FileCheck %s --check-prefixes=CHECK,CHECK-AARCH64 %}
+// RUN: %if x86-registered-target %{tco --target=x86_64-unknown-linux-gnu %s | FileCheck %s --check-prefixes=CHECK,CHECK-X86 %}
+// RUN: %if powerpc-registered-target %{tco --target=powerpc64le-unknown-linux-gnu %s | FileCheck %s --check-prefixes=CHECK,CHECK-PPC %}
 
 // CHECK-LABEL: define void @_QPtest_proc_dummy()
-// CHECK:         %[[VAL_3:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-AARCH64: %[[VAL_3:.*]] = alloca [36 x i8], i64 1, align 1
+// CHECK-X86:     %[[VAL_3:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-PPC:     %[[VAL_3:.*]] = alloca [4{{[0-8]+}} x i8], i64 1, align 1
 // CHECK:         %[[VAL_1:.*]] = alloca { ptr }, i64 1, align 8
 // CHECK:         %[[VAL_0:.*]] = alloca i32, i64 1, align 4
 // CHECK:         %[[VAL_2:.*]] = getelementptr { ptr }, ptr %[[VAL_1]], i32 0, i32 0
@@ -59,7 +63,9 @@ func.func @_QPtest_proc_dummy_other(%arg0: !fir.boxproc<() -> ()>) {
 }
 
 // CHECK-LABEL: define void @_QPtest_proc_dummy_char()
-// CHECK:         %[[VAL_20:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-AARCH64: %[[VAL_20:.*]] = alloca [36 x i8], i64 1, align 1
+// CHECK-X86:     %[[VAL_20:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-PPC:     %[[VAL_20:.*]] = alloca [4{{[0-8]+}} x i8], i64 1, align 1
 // CHECK:         %[[VAL_2:.*]] = alloca { { ptr, i64 } }, i64 1, align 8
 // CHECK:         %[[VAL_1:.*]] = alloca [10 x i8], i64 1, align 1
 // CHECK:         %[[VAL_0:.*]] = alloca [40 x i8], i64 1, align 1

ssijaric-nv · 2024-12-05T00:55:42Z

There is a separate PR under #118234 to pass the correct size to __trampoline_setup during AArch64 lowering.

jeanPerier

Thank you!

tblah · 2024-12-05T14:05:52Z

Please could you provide links to the documentation you used, preferably both in code comments and the commit summary

Leporacanthicus · 2024-12-05T14:23:39Z

Not a problem for this patch, but it would be REALLY nice if there was a function that could be called in LLVM to get the correct size for a trampoline, rather than each different user of LLVM having to implement their own variant of

if processorX size = A; 
else if processorY size = B; ...

Subject to providing some documentation for "why these values" (I understand what they represent and sort of why they need to be different, but it'd be good to have a source for what they are on different architectures).

ssijaric-nv · 2024-12-06T02:15:19Z

Thanks, all. Added a comment. I'll add a function to LLVM to retrieve the trampoline size, and will then push a follow-on patch to replace the if...then...else checks.

kkwli · 2024-12-10T15:20:45Z

My colleagues found that the size is 32 for ppc64le, 12 for powerpc-*-aix* and 24 for powerpc64-*-aix*.
@RolandF77 @mandlebug

ssijaric-nv · 2024-12-10T17:40:12Z

My colleagues found that the size is 32 for ppc64le, 12 for powerpc-*-aix* and 24 for powerpc64-*-aix*. @RolandF77 @mandlebug

Thank you for checking, Kelvin. I went by what's in PPCTargetLowering::LowerINIT_TRAMPOLINE, but these are only for Linux. Looks like it's safe to keep it at 32-bytes for all PPC targets, with the exception of ppc32 at 40 bytes?

kkwli · 2024-12-10T19:02:53Z

My colleagues found that the size is 32 for ppc64le, 12 for powerpc-*-aix* and 24 for powerpc64-*-aix*. @RolandF77 @mandlebug

Thank you for checking, Kelvin. I went by what's in PPCTargetLowering::LowerINIT_TRAMPOLINE, but these are only for Linux. Looks like it's safe to keep it at 32-bytes for all PPC targets, with the exception of ppc32 at 40 bytes?

I think it works for Linux. Thanks.

tblah

Thanks!

The trampoline size is 36 bytes for AArch64, 40 bytes for PPC32 and 48 bytes for PPC64. During AArch64 and PPC lowering, init.trampoline is lowered to a call to __trampoline_setup, with the corresponding trampoline sizes passed to it.

ssijaric-nv marked this pull request as ready for review December 5, 2024 00:50

llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:codegen labels Dec 5, 2024

jeanPerier approved these changes Dec 5, 2024

View reviewed changes

jeanPerier requested review from DavidTruby and kkwli December 5, 2024 11:02

kiranchandramohan requested review from davemgreen and tblah December 17, 2024 19:33

tblah approved these changes Dec 18, 2024

View reviewed changes

ssijaric-nv force-pushed the flang_trampoline_size branch from 5e5b89e to 273c5ed Compare January 22, 2025 04:17

ssijaric-nv added 5 commits January 22, 2025 10:32

Fix formatting

ae86c20

Minor cosmetic change

49f3b4d

Add comments

bd1d766

Move the thunk size determination to the Triple

2191d54

ssijaric-nv force-pushed the flang_trampoline_size branch from 273c5ed to 2191d54 Compare January 22, 2025 19:09

ssijaric-nv merged commit 16e9601 into llvm:main Jan 27, 2025
8 checks passed

[Flang] Adjust the trampoline size for AArch64 and PPC #118678

[Flang] Adjust the trampoline size for AArch64 and PPC #118678

Uh oh!

Conversation

ssijaric-nv commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Dec 5, 2024

Uh oh!

llvmbot commented Dec 5, 2024

Uh oh!

ssijaric-nv commented Dec 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeanPerier left a comment

Choose a reason for hiding this comment

Uh oh!

tblah commented Dec 5, 2024

Uh oh!

Leporacanthicus commented Dec 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ssijaric-nv commented Dec 6, 2024

Uh oh!

kkwli commented Dec 10, 2024

Uh oh!

ssijaric-nv commented Dec 10, 2024

Uh oh!

kkwli commented Dec 10, 2024

Uh oh!

tblah left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ssijaric-nv commented Dec 4, 2024 •

edited

Loading

github-actions bot commented Dec 4, 2024 •

edited

Loading

ssijaric-nv commented Dec 5, 2024 •

edited

Loading

Leporacanthicus commented Dec 5, 2024 •

edited

Loading